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Transfer RNA (tRNA) is undoubtedly 
the most central and one of the oldest 
molecules of the cell. Without it genetics 
and coded protein synthesis are impos- 
sible. The crucial specificities responsible 
for the genetic code and accurate trans- 
lation are by far entrusted to interactions 
between tRNA and translation proteins, 
fundamentally aminoacyl-tRNA syn- 
thetase (aaRS) enzymes and elongation 
factor (EF) switches (Yadavalli and Ibba, 
2012). Discrimination mediated by aaRSs 
and EFs against misincorporated tRNA 
and amino acids is at least 20 times more 
stringent than ribosomal recognition, 
editing, and other proofreading mecha- 
nisms (Reynolds et al., 2010). The fact 
that crucial genetic code specificities in 
highly selective interactions with protein 
enzymes do not involve the ribosomal 
ribonucleoprotein biosynthetic machin- 
ery challenges the "replicators first" origin 
of life scenario of an ancient RNA world 
(Caetano-AnoUes and Seufferheld, 2013). 
It also highlights the central functional, 
mechanistic, and evolutionary roles of 
tRNA and its recognition determinants, 
which enable coevolution between nucleic 
acids and proteins. These coevolutionary 
relationships are compatible with a late 
origin of the ribosome in its mechanism 
and not in protein biosynthesis, which was 
inferred from the computational analy- 
sis of thousands of RNAs and proteomes 
(Harish and Caetano-AnoUes, 2012). 
These analyses showed tight coevolution 
of ribosomal RNA (rRNA) and ribosomal 
proteins (r-proteins). While these rela- 
tionships delimit molecular makeup when 
organisms use translation to negotiate 



growth and viability amidst environmen- 
tal change, coevolution also constrains 
recruitment of the canonical L-shaped 
structure of the tRNA molecule into a mul- 
tiplicity of modern functions. These new 
functions include the synthesis of antibi- 
otics, bacterial cell wall peptidoglycans and 
tetrapyrroles, modification of bacterial 
membrane lipids, protein turnover, and 
the synthesis of other aminoacyl-tRNA 
molecules (Francklyn and Minajigi, 2010). 
Here we unfold coevolutionary relation- 
ships between tRNA substructures and 
translation proteins that embody crucial 
protein-nucleic acid interactions. We focus 
on a series of computational biology anal- 
yses of the structure and conformational 
diversity of tRNAs and their interacting 
proteins that provide information about 
the history of structural accretion of this 
"adaptor" molecule. Using this informa- 
tion, we place tRNA history within the 
framework of an evolutionary timeline of 
protein domain innovation, uncovering 
the natural history of tRNA within the 
context of the geological record. 

tRNA MOLECULES ARE OLD AND 
EVOLVE BY ACCRETION OF 
STRUCTURAL PARTS 

When studying the organismal distribu- 
tion of a catalog of over a thousand 
RNA families describing the modern RNA 
world, tRNA was found to be one of 
only five families that were universally 
present (Hoeppner et al., 2012). These 
families showed a strong vertical evo- 
lutionary trace and included rRNA and 
ribonuclease P (RNase P) RNA, which 
are present (with exceptions; e.g., Randau 



et al., 2008) in all studied cellular organ- 
isms and are minimally affected by hori- 
zontal gene transfer. We note however that 
RNA-free RNase P (Gutmann et al., 2012; 
Taschner et al, 2012) can challenge RNase 
P RNA ancestrality (Sun and Caetano- 
AnoUes, 2010). The ubiquity of tRNA in 
the ceUular lineages of life and its cen- 
tral molecular role provide strong support 
to the very early origin of the molecule, 
prompting the study of the origin and 
evolution of the tRNA molecule using 
information in its sequence and struc- 
ture (Fitch and Upper, 1987; Eigen et al, 
1989; Di Giulio, 1994; Sun and Caetano- 
AnoUes, 2008a; Farias, 2013). A compu- 
tational analysis of the history of tRNA 
based on the structure of thousands of 
molecules revealed that tRNAs evolve by 
accretion of component parts (substruc- 
tures) and that the "top half" of tRNA 
that includes the acceptor stem is more 
ancient that the "bottom half" with its 
anticodon arm (Sun and Caetano-AnoUes, 
2008a; reviewed in Sun and Caetano- 
AnoUes, 2008b) (Figure lA). WhUe other 
models of evolutionary growth of the 
tRNA molecule have been proposed (Di 
Giulio, 2012), phylogenetic reconstruc- 
tions are compatible with biochemical evi- 
dence of molecular recognition that makes 
amino acid charging ancestral and molec- 
ularly distant (~70A) to codon recogni- 
tion, which locate to more modern regions 
of tRNA (Caetano-AnoUes et al, 2013). 
These findings revive the "genomic tag" 
hypothesis in which tRNA harbored ances- 
tral genomic information and the derived 
bottom half provided genetic code speci- 
ficity (Weiner and Maizels, 1987). 
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FIGURE 1 I The natural history of tRNA inferred from nucleic 
acid-protein interactions and structural phylogenomics. (A) The history 
of tRNA portrays the history of its interactions with cognate aminoacyi-tRNA 
synthetase (aaRS) protein enzymes. This is exemplified by the domains of 
the tRNA and cysteinyl-tRNA synthetase binary complex (PDB entry lUOB), 
which are colored according to their age. The ancient "top half" of tRNA 
embeds a "operational code" in the identity elements of the acceptor arm 
that interact with the catalytic domain of aaRSs through classes i and II 
modes of tRNA recognition. The evoiutionariiy recent "bottom half" of 
tRNA holds the standard code in identity elements of the anticodon loop 
that interact with anticodon-binding domains of aaRSs. (B) Flow diagram 
showing the retrodiction strategy used to build phylogenetic trees of RNA 
molecules (ToMs) and associated trees of substructures (ToSs), and trees 
of protein domains (ToDs). The structures of RNA molecules are first 
decomposed into substructures. Structural features of substructures such 
as helical stem tracts and unpaired regions are coded as phylogenetic 
characters and assigned character states according to an evolutionary 
model that polarizes character transformation toward an increase in 
conformational order (character argumentation). Coded characters (s) are 
arranged in data matrices, which can be transposed. Phylogenetic analysis 
using maximum parsimony optimaiity criteria generates rooted ToMs and 
ToSs. A census of domain structures in proteomes of hundreds of 



completely sequenced organisms is used to compose data matrices, which 
are then used to build ToDs. Elements of the matrix (g) represent genomic 
abundances of domain structures in proteomes, defined at different levels 
of classification of domain structure (e.g., SCOP folds, superfamiiies, and 
families). They are converted into multi-state phylogenetic characters with 
character states transforming according to linearly ordered and reversible 
pathways. Embedded in the trees of nucleic acids and proteins are 
timelines that assign age to molecular structures and associated functions. 
(C) The natural history of tRNA and rRNA overiap when they are mapped 
onto a timeline of protein domain history. A tree of tRNA substructures 
(ToS) was derived from statistical phylogenetic characters that define a 
molecular morphospace (the Shannon entropy of the base-pairing 
probability matrix, base-pairing propensity and mean length of stem 
structures) in 571 tRNA molecules. The optimal most parsimonious tree 
(43,281 steps; consistency index = 0.853, retention index = 0.654, 
rescaied consistency index = 0.557, gi = —1.033) was recovered from a 
branch-and-bound search. The most basal subtree of a ToS describing the 
evolution of the rRNA core (Harish and Caetano-Anolles, 2012) is also 
shown. Both trees are anchored to the geological record via an evolutionary 
timeline of first appearance of protein domains that are capable of 
establishing crucial interactions with the RNA molecules (see description in 
the main text). AC, anticodon; PTC, peptidyi transferase center. 
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PHYLOGENOMIC RETRODICTION 
UNCOVERS COEVOLUTION BETWEEN 
tRNA SUBSTRUCTURES AND 
INTERACTING aaRS PROTEIN 
DOMAINS 

In the studies mentioned above, phylo- 
genetic analysis of nucleic acid structure 
was directly derived from structural topol- 
ogy and the thermodynamics of tRNA 
(Caetano-Anolles, 2002a,b; Sun et al., 
2007; Sun and Caetano-Anolles, 2008a), 
taking unique advantage of links that exist 
between secondary structure and confor- 
mation, dynamics, and adaptation (Bailor 
et al., 2010). Specifically, a census of geo- 
metrical features that describe the length 
and topology of tRNA substructures (such 
as stem and non-paired segments) or sta- 
tistical features describing their stability 
and conformational diversity were ana- 
lyzed with modern phylogenetic methods 
to produce phylogenetic trees of molecules 
(ToMs) and trees of substructures (ToSs) 
that portray the history of the system 
(molecules) or its component parts (sub- 
structures), respectively. Figure IC shows 
a ToS that describes the evolution of 
stem substructures of the tRNA molecule 
and of early evolving stem substructures 
of rRNA. The trees that are produced 
are rooted using a phylogenetic process 
model that complies with Weston's gen- 
erality criterion. The model automatically 
roots the trees by assuming conforma- 
tional stability increases in evolution as 
structures become canalized (Sun et al., 
2010). The validity of polarization and 
rooting depends on the axiomatic com- 
ponent of character transformation, which 
is falsifiable and supported by consider- 
able evidence (e.g., thermodynamic and 
phylogenetic; Sun et al., 2010). 

While ToSs are powerful retrodic- 
tion statements that unfold history of 
RNA accretion (Sun and Caetano-Anolles, 
2008a,b,c, 2009, 2010; Sun et al, 2007; 
Harish and Caetano-Anolles, 2012), the 
gradual appearance of protein domains 
in evolutionary history can be inferred 
from phylogenomic trees of domains 
(ToDs) (Figure IB) (Caetano-Anolles and 
Caetano-Anolles, 2003) and can illus- 
trate the establishment of intermolecu- 
lar interactions in evolution. Domains are 
structural and evolutionary units of pro- 
teins that are highly conserved (Caetano- 
Anolles et al., 2009). The evolutionary 



accumulation of these units unfolds recur- 
rence patterns that encompass the entire 
history of proteins and can be mined with 
suitable phylogenomic methods. ToDs are 
derived from a structural census of protein 
domains in the proteomes of hundreds 
to thousands of genomes that have been 
completely sequenced. The fold structures 
of domains are defined using the differ- 
ent levels of structural abstraction of the 
accepted classification gold standards, the 
SCOP (Murzin et al, 1995) or CATH 
(Orengo et al, 1997) databases. Timelines 
of domain innovation are then derived 
directly from the trees taking advan- 
tage of their highly imbalanced nature. 
Imbalance unfolds when the splitting of 
lineages depends on an evolving "heri- 
table" trait (Heard, 1996). In our case, 
the evolving trait is the gradual accu- 
mulation of domains in proteomes and 
the semipunctuated discovery of new fold 
structures (made evident for example in 
simulations; Zeldovich et al, 2007). The 
predictive power of ToDs is consider- 
able (Caetano-Anolles and Seufferheld, 
2013) and central for the history of 
tRNA, as ToDs have established the evo- 
lutionary history of aaRS domain struc- 
tures and their associated coevolving tRNA 
molecules (Caetano-Anolles et al, 2013). 
The timeline of evolutionary appearance 
of fold families revealed the early emer- 
gence of the "operational" RNA code 
linked to the specificities of synthetases 
that were homologous to the catalytic 
domains of modern TyrRS and SerRS pro- 
tein enzymes. These archaic synthetases 
interacted with the "top half" of tRNA 
and were capable of peptide bond for- 
mation and aminoacylation (Caetano- 
Anolles et al, 2013). The timeline also 
showed the late implementation of the 
standard genetic code with the late appear- 
ance of anticodon-binding domains that 
interacted with the "bottom half" of 
tRNA. Figure lA shows a representative 
aaRS enzyme and the tight coevolution- 
ary link between aaRS domains and tRNA 
arms. Remarkably, structural phyloge- 
nomic retrodictions indicate that genet- 
ics arose through episodes of structural 
recruitment as an exacting mechanism 
that favored flexibility and folding of the 
emergent proteins (Caetano-Anolles et al, 
2013). These enhancements of phenotypic 
robustness matched evolutionary trends of 



folding speed in proteins (Debes et al., 
2013) and are compatible with recent sim- 
ulations of the origin of the genetic code 
(Jee et al., 2013). 

ABUNDANCE OF PROTEIN DOMAINS 
IN PROTEOMES FOLLOWS AN 
EVOLUTIONARY CLOCK 

The history of RNA does not repre- 
sent a phylogenetic statement that applies 
to the entire world of RNA molecules. 
Consequently, it cannot be placed within 
a global historical context. In contrast, the 
history of protein domains inferred from 
ToDs follows a global molecular clock of 
fold structures that spans 3.8 billion years 
(Gy) of evolution (Wang et al, 2011). 
Traditionally, molecular clocks are based 
on rates of change in protein or nucleic 
acid sequences, which are limited by his- 
torical information existing in the individ- 
ual protein or nucleic acid molecules being 
studied (Zuckerkandl and Pauling, 1965; 
Ayala et al., 1998). These clocks are there- 
fore constrained by the highly dynamic 
nature of sequence change, including the 
problems of mutational saturation and 
rate heterogeneity (heterotachy). In con- 
trast, molecular structures exhibit char- 
acteristics of recurrent change that are 
much more stable. The clocks of domain 
structures were calibrated by associating 
diagnostic domain structures with mul- 
tiple geological ages derived from the 
study of fossils and microfossils, geochem- 
ical, biochemical, and biomarker data. 
Remarkably, excellent linear correlations 
between the ages of domain structures at 
fold and fold superfamily levels of SCOP 
and geological timescales were identified 
and used to time fundamental evolution- 
ary events (Wang et al., 2011). These 
events included the rise of planetary oxy- 
gen and episodes of organismal diversi- 
fication (Wang et al., 2011; Kim et al., 
2012). 

THE CLOVERLEAF STRUCTURE OF tRNA 
UNFOLDS EARLY IN EVOLUTION, PRIOR 
TO THE APPEARANCE OF A 
FUNCTIONAL RIBOSOMAL 
MACHINERY 

Assuming that the age of interactions that 
are established between RNA and pro- 
teins is the age of the interacting com- 
ponents, we tracked the appearance of 
domains in ribonucleoprotein complexes 
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along the evolutionary timeline and used 
the molecular clock of folds to link interac- 
tions to a geological timescale (Figure IC). 
The catalytic domains of classes I and 
II aaRS enzymes (belonging to SCOP 
families d. 104. 1.1 and c.26.1.1, respec- 
tively) are the first to appear in the time- 
line ~3.7Gy ago (Caetano-Anolles et al., 
2013). These domains harbor pre-transfer 
and post-transfer editing and trans-editing 
activities. The most ancient of these edit- 
ing structures, present in the catalytic 
domains of TyrRS, SerRS, and LeuRS, 
involve interactions with the oldest type 
II cognate tRNAs, which harbor a long 
variable loop necessary for tRNA recog- 
nition (Sun and Caetano-Anolles, 2008c). 
While the evolutionary significance of 
the variable loop in tRNA-aaRS interac- 
tions is unclear (Sun and Caetano-Anolles, 
2008c), its late evolutionary appearance 
could simply represent the shift or recruit- 
ment of an archaic interacting region of 
the molecule. Interactions of tRNA with 
the "ValRS/IleRS/LeuRS editing" domain 
(SCOP family b. 5 1.1.1) (Hale et al, 
1997) suggest the D arm was already 
present ~3.3 Gy ago, which is derived 
compared to the acceptor stem (Sun 
and Caetano-Anolles, 2008a). The late 
appearance of anticodon-binding domains 
(beginning with SCOP family c. 5 1.1.1) in 
well over half of aaRSs ~3 Gy ago con- 
firms that the full "bottom half" of tRNA 
and its anticodon loop identity elements 
unfolded completely before the onset 
of planetary oxygenation and cellular 
diversification ~2.9 Gy ago. 

Comparing the natural history of tRNA 
(Sun and Caetano-Anolles, 2008a) and the 
ribosome (Harish and Caetano-Anolles, 
2012) within the framework of the inter- 
acting proteins shows the remarkable 
functional connection of the cloverleaf 
structure and ribosomal functionality 
(Figure IC). The origin of r-proteins in 
interaction with helix 44 (the riboso- 
mal ratchet) of the small subunit (SSU) 
rRNA occurred 3.3-3.4 Gy ago once the 
tRNA molecule unfolded its anticodon 
arm. This manifests in the pivotal role 
of one of the two earliest r-proteins, S12, 
in tRNA selection (anticipated by Ogle 
and Ramakrishnan, 2005), which is medi- 
ated by a bonding network connecting 
two sites in S12 to the anticodon and the 
CCA arm of the tRNA-elongation factor 



bound state (Li et al, 2008). Similarly, 
the full cloverleaf structure of tRNA 
was already present when the riboso- 
mal peptidyl transferase center (PTC) 
responsible for modern protein synthe- 
sis appeared in the emerging domain V 
of the large subunit of rRNA 2.8-3.1 Gy 
ago. This is an expected outcome since 
the structurally mature 70-80 A-long and 
20-25 A-wide tRNA molecule must tra- 
verse a path of ~ 100 A and physically span 
the intersubunit interface of the ribosomal 
core for the ensemble to be fully func- 
tional (Agirrezabala and Frank, 2009). 
Remarkably, this late development of the 
ribosomal core coincided with the appear- 
ance of pathways of amino acid (Kim et al, 

2012) and purine nucleotide biosynthesis 
(Caetano-Anolles and Caetano-Anolles, 

2013) . This suggests that tRNA and 
ribosomal functionality (anticodon loop 
recognition, decoding, protein biosynthe- 
sis) and modern metabolic pathways for 
amino acids and nucleotides developed 
concurrently, supporting the co-evolution 
theory of the genetic code (Wong, 2005). 

CONCLUSION 

The natural and overlapping history of 
tRNA and rRNA reveals that: ( 1 ) the tRNA 
cloverleaf structure unfolded prior to the 
appearance of a fully functional ribosomal 
core, (2) the primordial role of tRNA, orig- 
inally linked to archaic dipeptide-forming 
synthetases, was coopted into modern 
translation functions once anticodon- 
loop specificities appeared concurrently 
with the PTC, and (3) the emergence 
of modern genetics unfolded relatively 
quickly in a period of 0.3-0.5 Gy, start- 
ing with anticodon-loop recognition 
and once the cloverleaf structure had 
formed. 
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